An Approach to Classify Semi-structured Objects

نویسندگان

  • Elisa Bertino
  • Giovanna Guerrini
  • Isabella Merlo
  • Marco Mesiti
چکیده

Several advanced applications, such as those dealing with the Web, need to handle data whose structure is not known a-priori. Such requirement severely limits the applicability of traditional database techniques, that are based on the fact that the structure of data (e.g. the database schema) is known before data are entered into the database. Moreover, in traditional database systems, whenever a data item (e.g. a tuple, an object, and so on) is entered, the application specifies the collection (e.g. relation, class, and so on) the data item belongs to. Collections are the basis for handling queries and indexing and therefore a proper classification of data items in collections is crucial. In this paper, we address this issue in the context of an extended object-oriented data model. We propose an approach to classify objects, created without specifying the class they belong to, in the most appropriate class of the schema, that is, the class closest to the object state. In particular, we introduce the notion of weak membership of an object in a class, and define two measures, the conformity and the heterogeneity degrees, exploited by our classification algorithm to identify the most appropriate class in which an object can be classified, among the ones of which it is a weak member.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handling Semi - Structured Data through anExtended Object - Oriented Data

In traditional database applications the structure of data is pre-deened, and data are entered into the database specifying the schema element (relation or class, depending on the paradigm) they belong to. New emerging database applications, expecially those related to the Web, are characterized by data that have an irregular, heterogeneous, partial structure that quickly evolves. In this paper...

متن کامل

A Framework to identify and classify human resources strategic directions in Iran oil industry using qualitative approach

This study has been conducted to the human resource strategic direction in each main sub-section of HR in Iran oil industries, and some sources such as: upstream documents, business strategic, trends, adaptive comparison, historical documents, and half- structured interviews has been used for gathering information. The research method was qualitative and contents analysis was used for data anal...

متن کامل

Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk

This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...

متن کامل

Top-down Extraction of Semi-Structured Data

In this paper, we propose an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. We propose a top-down strategy that extracts complex objects decomposing them in objects less complex, until atomic objects have been extracted. Through e...

متن کامل

Discovering Frequent Substructures from Hierarchical Semi-structured Data

Frequent substructure discovery from a collection of semi-structured objects can serve for storage, browsing, querying, indexing and classification of semi-structured documents. This paper examines the problem of discovering frequent substructures from a collection of hierarchical semi-structured objects of the same type. The use of wildcard is an important aspect of substructure discovery from...

متن کامل

Processing Semi-Structured Data in Object Bases

We address the problem of null values and other forms of semi-structured data in object-oriented databases. Various aspects and issues concerning semi-structured data that are currently presented in the literature are discussed in the paper. We propose a new universal approach to semi-structured data based on the idea of absent objects. The idea covers null values and union types and can be smo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999